Overview
Brought to you by YData
Dataset statistics
| Number of variables | 18 |
|---|---|
| Number of observations | 15000 |
| Missing cells | 2890 |
| Missing cells (%) | 1.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 7.9 MiB |
| Average record size in memory | 551.5 B |
Variable types
| Text | 2 |
|---|---|
| Numeric | 11 |
| Boolean | 1 |
| Categorical | 4 |
POSSIBLENterm has constant value "True" | Constant |
Insidesource has constant value "TMHMM2.0" | Constant |
TMhelixsource has constant value "TMHMM2.0" | Constant |
Outsidesource has constant value "TMHMM2.0" | Constant |
ExpnumberofAAsinTMHs is highly overall correlated with Insideend and 5 other fields | High correlation |
Insideend is highly overall correlated with ExpnumberofAAsinTMHs and 5 other fields | High correlation |
Insidestart is highly overall correlated with ExpnumberofAAsinTMHs and 4 other fields | High correlation |
Length is highly overall correlated with Insideend and 1 other fields | High correlation |
Outsideend is highly overall correlated with Length and 3 other fields | High correlation |
Outsidestart is highly overall correlated with ExpnumberofAAsinTMHs and 4 other fields | High correlation |
PredictedTMHsNumber is highly overall correlated with ExpnumberofAAsinTMHs and 5 other fields | High correlation |
TMhelixend is highly overall correlated with ExpnumberofAAsinTMHs and 6 other fields | High correlation |
TMhelixstart is highly overall correlated with ExpnumberofAAsinTMHs and 6 other fields | High correlation |
POSSIBLENterm has 2890 (19.3%) missing values | Missing |
Protein_ID has unique values | Unique |
Expnumberfirst60AAs has 765 (5.1%) zeros | Zeros |
Reproduction
| Analysis started | 2025-07-15 11:52:32.405867 |
|---|---|
| Analysis finished | 2025-07-15 11:52:44.979108 |
| Duration | 12.57 seconds |
| Software version | ydata-profiling v0.0.dev0 |
| Download configuration | config.json |
Variables
Phage_ID
Text
| Distinct | 14674 |
|---|---|
| Distinct (%) | 97.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.3 MiB |
Length
| Max length | 87 |
|---|---|
| Median length | 86 |
| Mean length | 23.835467 |
| Min length | 6 |
Unique
| Unique | 14361 ? |
|---|---|
| Unique (%) | 95.7% |
Sample
| 1st row | MGV-GENOME-0377366 |
|---|---|
| 2nd row | MGV-GENOME-0228589 |
| 3rd row | TemPhD_cluster_54944 |
| 4th row | TemPhD_cluster_21940 |
| 5th row | uvig_280215 |
| Value | Count | Frequency (%) |
| mgv-genome-0340415 | 3 | < 0.1% |
| temphd_cluster_8028 | 3 | < 0.1% |
| mycobacterium_phage_gadjet | 3 | < 0.1% |
| otu_72 | 3 | < 0.1% |
| uvig_15691 | 3 | < 0.1% |
| uvig_200799 | 3 | < 0.1% |
| uvig_588980 | 3 | < 0.1% |
| nc_042116.1 | 3 | < 0.1% |
| station168_sur_all_assembly_node_176_length_220308_cov_73.630021 | 3 | < 0.1% |
| temphd_cluster_9569 | 3 | < 0.1% |
| Other values (14664) | 14970 |
Most occurring characters
| Value | Count | Frequency (%) |
| _ | 34456 | 9.6% |
| 1 | 18585 | 5.2% |
| 0 | 16049 | 4.5% |
| 3 | 15294 | 4.3% |
| 2 | 14762 | 4.1% |
| E | 12304 | 3.4% |
| 5 | 12134 | 3.4% |
| 4 | 12091 | 3.4% |
| M | 11196 | 3.1% |
| 7 | 10981 | 3.1% |
| Other values (55) | 199680 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 357532 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| _ | 34456 | 9.6% |
| 1 | 18585 | 5.2% |
| 0 | 16049 | 4.5% |
| 3 | 15294 | 4.3% |
| 2 | 14762 | 4.1% |
| E | 12304 | 3.4% |
| 5 | 12134 | 3.4% |
| 4 | 12091 | 3.4% |
| M | 11196 | 3.1% |
| 7 | 10981 | 3.1% |
| Other values (55) | 199680 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 357532 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| _ | 34456 | 9.6% |
| 1 | 18585 | 5.2% |
| 0 | 16049 | 4.5% |
| 3 | 15294 | 4.3% |
| 2 | 14762 | 4.1% |
| E | 12304 | 3.4% |
| 5 | 12134 | 3.4% |
| 4 | 12091 | 3.4% |
| M | 11196 | 3.1% |
| 7 | 10981 | 3.1% |
| Other values (55) | 199680 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 357532 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| _ | 34456 | 9.6% |
| 1 | 18585 | 5.2% |
| 0 | 16049 | 4.5% |
| 3 | 15294 | 4.3% |
| 2 | 14762 | 4.1% |
| E | 12304 | 3.4% |
| 5 | 12134 | 3.4% |
| 4 | 12091 | 3.4% |
| M | 11196 | 3.1% |
| 7 | 10981 | 3.1% |
| Other values (55) | 199680 |
Protein_ID
Text
Unique 
| Distinct | 15000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.3 MiB |
Length
| Max length | 90 |
|---|---|
| Median length | 88 |
| Mean length | 26.651867 |
| Min length | 9 |
Unique
| Unique | 15000 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | MGV-GENOME-0377366_94 |
|---|---|
| 2nd row | MGV-GENOME-0228589_3 |
| 3rd row | TemPhD_cluster_54944_50 |
| 4th row | TemPhD_cluster_21940_29 |
| 5th row | uvig_280215_16 |
| Value | Count | Frequency (%) |
| station137_mes_combined_final_node_8849_length_10087_cov_2.107157_9 | 1 | < 0.1% |
| station155_dcm_all_assembly_node_4760_length_17116_cov_5.143016_14 | 1 | < 0.1% |
| mgv-genome-0377366_94 | 1 | < 0.1% |
| mgv-genome-0228589_3 | 1 | < 0.1% |
| temphd_cluster_54944_50 | 1 | < 0.1% |
| temphd_cluster_21940_29 | 1 | < 0.1% |
| uvig_280215_16 | 1 | < 0.1% |
| temphd_cluster_2820_6 | 1 | < 0.1% |
| uvig_396803_67 | 1 | < 0.1% |
| mgv-genome-0085121_16 | 1 | < 0.1% |
| Other values (14990) | 14990 |
Most occurring characters
| Value | Count | Frequency (%) |
| _ | 49101 | 12.3% |
| 1 | 23704 | 5.9% |
| 2 | 18767 | 4.7% |
| 3 | 18764 | 4.7% |
| 0 | 17832 | 4.5% |
| 4 | 14966 | 3.7% |
| 5 | 14590 | 3.6% |
| 6 | 13121 | 3.3% |
| 7 | 12953 | 3.2% |
| 8 | 12495 | 3.1% |
| Other values (55) | 203485 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 399778 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| _ | 49101 | 12.3% |
| 1 | 23704 | 5.9% |
| 2 | 18767 | 4.7% |
| 3 | 18764 | 4.7% |
| 0 | 17832 | 4.5% |
| 4 | 14966 | 3.7% |
| 5 | 14590 | 3.6% |
| 6 | 13121 | 3.3% |
| 7 | 12953 | 3.2% |
| 8 | 12495 | 3.1% |
| Other values (55) | 203485 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 399778 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| _ | 49101 | 12.3% |
| 1 | 23704 | 5.9% |
| 2 | 18767 | 4.7% |
| 3 | 18764 | 4.7% |
| 0 | 17832 | 4.5% |
| 4 | 14966 | 3.7% |
| 5 | 14590 | 3.6% |
| 6 | 13121 | 3.3% |
| 7 | 12953 | 3.2% |
| 8 | 12495 | 3.1% |
| Other values (55) | 203485 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 399778 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| _ | 49101 | 12.3% |
| 1 | 23704 | 5.9% |
| 2 | 18767 | 4.7% |
| 3 | 18764 | 4.7% |
| 0 | 17832 | 4.5% |
| 4 | 14966 | 3.7% |
| 5 | 14590 | 3.6% |
| 6 | 13121 | 3.3% |
| 7 | 12953 | 3.2% |
| 8 | 12495 | 3.1% |
| Other values (55) | 203485 |
Length
Real number (ℝ)
High correlation 
| Distinct | 1150 |
|---|---|
| Distinct (%) | 7.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 216.97947 |
| Minimum | 23 |
|---|---|
| Maximum | 5055 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 234.4 KiB |
Quantile statistics
| Minimum | 23 |
|---|---|
| 5-th percentile | 47 |
| Q1 | 81 |
| median | 128 |
| Q3 | 213 |
| 95-th percentile | 781 |
| Maximum | 5055 |
| Range | 5032 |
| Interquartile range (IQR) | 132 |
Descriptive statistics
| Standard deviation | 283.11148 |
|---|---|
| Coefficient of variation (CV) | 1.3047847 |
| Kurtosis | 30.426127 |
| Mean | 216.97947 |
| Median Absolute Deviation (MAD) | 57 |
| Skewness | 4.3630647 |
| Sum | 3254692 |
| Variance | 80152.111 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 71 | 120 | 0.8% |
| 66 | 114 | 0.8% |
| 107 | 114 | 0.8% |
| 68 | 111 | 0.7% |
| 60 | 110 | 0.7% |
| 128 | 109 | 0.7% |
| 78 | 103 | 0.7% |
| 91 | 102 | 0.7% |
| 69 | 101 | 0.7% |
| 88 | 101 | 0.7% |
| Other values (1140) | 13915 |
| Value | Count | Frequency (%) |
| 23 | 1 | < 0.1% |
| 25 | 1 | < 0.1% |
| 26 | 1 | < 0.1% |
| 27 | 1 | < 0.1% |
| 28 | 2 | < 0.1% |
| 29 | 33 | |
| 30 | 29 | |
| 31 | 37 | |
| 32 | 41 | |
| 33 | 33 |
| Value | Count | Frequency (%) |
| 5055 | 1 | |
| 4711 | 1 | |
| 3789 | 1 | |
| 3582 | 1 | |
| 3366 | 1 | |
| 3283 | 1 | |
| 3027 | 1 | |
| 2861 | 1 | |
| 2857 | 1 | |
| 2705 | 1 |
PredictedTMHsNumber
Real number (ℝ)
High correlation 
| Distinct | 24 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.8842 |
| Minimum | 1 |
|---|---|
| Maximum | 26 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 234.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 5 |
| Maximum | 26 |
| Range | 25 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.846858 |
|---|---|
| Coefficient of variation (CV) | 0.9801815 |
| Kurtosis | 27.806686 |
| Mean | 1.8842 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.4740398 |
| Sum | 28263 |
| Variance | 3.4108844 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 8806 | |
| 2 | 3750 | |
| 3 | 1081 | 7.2% |
| 4 | 561 | 3.7% |
| 5 | 211 | 1.4% |
| 6 | 180 | 1.2% |
| 10 | 78 | 0.5% |
| 7 | 66 | 0.4% |
| 8 | 64 | 0.4% |
| 11 | 45 | 0.3% |
| Other values (14) | 158 | 1.1% |
| Value | Count | Frequency (%) |
| 1 | 8806 | |
| 2 | 3750 | |
| 3 | 1081 | 7.2% |
| 4 | 561 | 3.7% |
| 5 | 211 | 1.4% |
| 6 | 180 | 1.2% |
| 7 | 66 | 0.4% |
| 8 | 64 | 0.4% |
| 9 | 42 | 0.3% |
| 10 | 78 | 0.5% |
| Value | Count | Frequency (%) |
| 26 | 1 | < 0.1% |
| 24 | 1 | < 0.1% |
| 22 | 1 | < 0.1% |
| 21 | 1 | < 0.1% |
| 20 | 8 | |
| 19 | 1 | < 0.1% |
| 18 | 11 | |
| 17 | 2 | < 0.1% |
| 16 | 12 | |
| 15 | 8 |
ExpnumberofAAsinTMHs
Real number (ℝ)
High correlation 
| Distinct | 13589 |
|---|---|
| Distinct (%) | 90.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 41.601879 |
| Minimum | 9.06624 |
|---|---|
| Maximum | 558.81059 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 234.4 KiB |
Quantile statistics
| Minimum | 9.06624 |
|---|---|
| 5-th percentile | 17.401948 |
| Q1 | 20.867945 |
| median | 23.1689 |
| Q3 | 44.302967 |
| 95-th percentile | 110.20577 |
| Maximum | 558.81059 |
| Range | 549.74435 |
| Interquartile range (IQR) | 23.435022 |
Descriptive statistics
| Standard deviation | 42.842182 |
|---|---|
| Coefficient of variation (CV) | 1.0298136 |
| Kurtosis | 27.936477 |
| Mean | 41.601879 |
| Median Absolute Deviation (MAD) | 5.542635 |
| Skewness | 4.4509783 |
| Sum | 624028.19 |
| Variance | 1835.4525 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 24.87583 | 19 | 0.1% |
| 18.23661 | 17 | 0.1% |
| 47.86547 | 16 | 0.1% |
| 210.43458 | 16 | 0.1% |
| 108.33627 | 14 | 0.1% |
| 39.48045 | 13 | 0.1% |
| 36.04048 | 13 | 0.1% |
| 108.28257 | 11 | 0.1% |
| 21.42924 | 11 | 0.1% |
| 46.4987 | 10 | 0.1% |
| Other values (13579) | 14860 |
| Value | Count | Frequency (%) |
| 9.06624 | 1 | |
| 9.72277 | 1 | |
| 10.14356 | 1 | |
| 10.23364 | 1 | |
| 10.68052 | 1 | |
| 11.23047 | 2 | |
| 11.41795 | 1 | |
| 11.44372 | 1 | |
| 11.49807 | 1 | |
| 11.49934 | 1 |
| Value | Count | Frequency (%) |
| 558.81059 | 1 | |
| 527.93568 | 1 | |
| 520.15842 | 1 | |
| 511.60373 | 1 | |
| 494.58563 | 1 | |
| 491.00928 | 1 | |
| 473.96137 | 1 | |
| 473.24722 | 1 | |
| 470.79895 | 1 | |
| 470.76217 | 1 |
Expnumberfirst60AAs
Real number (ℝ)
Zeros 
| Distinct | 12383 |
|---|---|
| Distinct (%) | 82.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 20.585516 |
| Minimum | 0 |
|---|---|
| Maximum | 49.17952 |
| Zeros | 765 |
| Zeros (%) | 5.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 234.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 16.499458 |
| median | 21.23728 |
| Q3 | 25.40934 |
| 95-th percentile | 41.447687 |
| Maximum | 49.17952 |
| Range | 49.17952 |
| Interquartile range (IQR) | 8.9098825 |
Descriptive statistics
| Standard deviation | 12.224024 |
|---|---|
| Coefficient of variation (CV) | 0.59381673 |
| Kurtosis | -0.46734641 |
| Mean | 20.585516 |
| Median Absolute Deviation (MAD) | 4.53789 |
| Skewness | -0.11228669 |
| Sum | 308782.75 |
| Variance | 149.42677 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 765 | 5.1% |
| 0.00018 | 33 | 0.2% |
| 42.15085 | 23 | 0.2% |
| 24.87583 | 19 | 0.1% |
| 18.23661 | 17 | 0.1% |
| 0.00055 | 16 | 0.1% |
| 25.0819 | 16 | 0.1% |
| 8 × 10-5 | 15 | 0.1% |
| 38.11529 | 14 | 0.1% |
| 1 × 10-5 | 14 | 0.1% |
| Other values (12373) | 14068 |
| Value | Count | Frequency (%) |
| 0 | 765 | |
| 1 × 10-5 | 14 | 0.1% |
| 2 × 10-5 | 8 | 0.1% |
| 3 × 10-5 | 11 | 0.1% |
| 4 × 10-5 | 5 | < 0.1% |
| 5 × 10-5 | 3 | < 0.1% |
| 6 × 10-5 | 10 | 0.1% |
| 7 × 10-5 | 4 | < 0.1% |
| 8 × 10-5 | 15 | 0.1% |
| 9 × 10-5 | 8 | 0.1% |
| Value | Count | Frequency (%) |
| 49.17952 | 1 | |
| 47.80002 | 1 | |
| 47.7514 | 1 | |
| 47.03513 | 1 | |
| 46.73738 | 1 | |
| 46.71216 | 1 | |
| 46.14895 | 1 | |
| 46.10417 | 1 | |
| 46.02934 | 1 | |
| 45.99137 | 1 |
TotalprobofNin
Real number (ℝ)
| Distinct | 11967 |
|---|---|
| Distinct (%) | 79.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.59128324 |
| Minimum | 4 × 10-5 |
|---|---|
| Maximum | 1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 234.4 KiB |
Quantile statistics
| Minimum | 4 × 10-5 |
|---|---|
| 5-th percentile | 0.0225355 |
| Q1 | 0.2392375 |
| median | 0.69571 |
| Q3 | 0.92887 |
| 95-th percentile | 0.9962505 |
| Maximum | 1 |
| Range | 0.99996 |
| Interquartile range (IQR) | 0.6896325 |
Descriptive statistics
| Standard deviation | 0.35351111 |
|---|---|
| Coefficient of variation (CV) | 0.59787102 |
| Kurtosis | -1.4030931 |
| Mean | 0.59128324 |
| Median Absolute Deviation (MAD) | 0.277735 |
| Skewness | -0.38427659 |
| Sum | 8869.2486 |
| Variance | 0.12497011 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.99854 | 25 | 0.2% |
| 0.56701 | 19 | 0.1% |
| 0.59265 | 17 | 0.1% |
| 0.28216 | 16 | 0.1% |
| 0.03881 | 15 | 0.1% |
| 0.97286 | 14 | 0.1% |
| 0.67222 | 13 | 0.1% |
| 0.86194 | 13 | 0.1% |
| 0.61003 | 11 | 0.1% |
| 0.99474 | 11 | 0.1% |
| Other values (11957) | 14846 |
| Value | Count | Frequency (%) |
| 4 × 10-5 | 3 | |
| 5 × 10-5 | 2 | |
| 7 × 10-5 | 1 | < 0.1% |
| 0.00011 | 1 | < 0.1% |
| 0.00013 | 2 | |
| 0.00014 | 1 | < 0.1% |
| 0.00018 | 1 | < 0.1% |
| 0.00019 | 1 | < 0.1% |
| 0.0002 | 1 | < 0.1% |
| 0.00021 | 2 |
| Value | Count | Frequency (%) |
| 1 | 3 | < 0.1% |
| 0.99998 | 7 | |
| 0.99997 | 3 | < 0.1% |
| 0.99996 | 7 | |
| 0.99995 | 5 | |
| 0.99994 | 8 | |
| 0.99993 | 5 | |
| 0.99992 | 5 | |
| 0.99991 | 4 | |
| 0.9999 | 4 |
POSSIBLENterm
Boolean
Constant  Missing 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 2890 |
| Missing (%) | 19.3% |
| Memory size | 633.2 KiB |
| True | |
|---|---|
| (Missing) |
| Value | Count | Frequency (%) |
| True | 12110 | |
| (Missing) | 2890 | 19.3% |
Insidesource
Categorical
Constant 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.0 MiB |
| TMHMM2.0 |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | TMHMM2.0 |
|---|---|
| 2nd row | TMHMM2.0 |
| 3rd row | TMHMM2.0 |
| 4th row | TMHMM2.0 |
| 5th row | TMHMM2.0 |
Common Values
| Value | Count | Frequency (%) |
| TMHMM2.0 | 15000 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| tmhmm2.0 | 15000 |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 45000 | |
| T | 15000 | 12.5% |
| H | 15000 | 12.5% |
| 2 | 15000 | 12.5% |
| . | 15000 | 12.5% |
| 0 | 15000 | 12.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 120000 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| M | 45000 | |
| T | 15000 | 12.5% |
| H | 15000 | 12.5% |
| 2 | 15000 | 12.5% |
| . | 15000 | 12.5% |
| 0 | 15000 | 12.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 120000 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| M | 45000 | |
| T | 15000 | 12.5% |
| H | 15000 | 12.5% |
| 2 | 15000 | 12.5% |
| . | 15000 | 12.5% |
| 0 | 15000 | 12.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 120000 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| M | 45000 | |
| T | 15000 | 12.5% |
| H | 15000 | 12.5% |
| 2 | 15000 | 12.5% |
| . | 15000 | 12.5% |
| 0 | 15000 | 12.5% |
Insidestart
Real number (ℝ)
High correlation 
| Distinct | 817 |
|---|---|
| Distinct (%) | 5.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 90.841933 |
| Minimum | 1 |
|---|---|
| Maximum | 3783 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 234.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 33 |
| Q3 | 85 |
| 95-th percentile | 461.05 |
| Maximum | 3783 |
| Range | 3782 |
| Interquartile range (IQR) | 84 |
Descriptive statistics
| Standard deviation | 184.49293 |
|---|---|
| Coefficient of variation (CV) | 2.0309226 |
| Kurtosis | 44.750878 |
| Mean | 90.841933 |
| Median Absolute Deviation (MAD) | 32 |
| Skewness | 5.1643474 |
| Sum | 1362629 |
| Variance | 34037.642 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 4962 | |
| 27 | 511 | 3.4% |
| 28 | 493 | 3.3% |
| 33 | 419 | 2.8% |
| 24 | 325 | 2.2% |
| 38 | 323 | 2.2% |
| 22 | 244 | 1.6% |
| 43 | 183 | 1.2% |
| 25 | 182 | 1.2% |
| 62 | 146 | 1.0% |
| Other values (807) | 7212 |
| Value | Count | Frequency (%) |
| 1 | 4962 | |
| 19 | 6 | < 0.1% |
| 20 | 7 | < 0.1% |
| 21 | 2 | < 0.1% |
| 22 | 244 | 1.6% |
| 23 | 145 | 1.0% |
| 24 | 325 | 2.2% |
| 25 | 182 | 1.2% |
| 26 | 51 | 0.3% |
| 27 | 511 | 3.4% |
| Value | Count | Frequency (%) |
| 3783 | 1 | |
| 3223 | 1 | |
| 2852 | 1 | |
| 2762 | 1 | |
| 2629 | 1 | |
| 2201 | 1 | |
| 2124 | 1 | |
| 2073 | 1 | |
| 2037 | 1 | |
| 2013 | 1 |
Insideend
Real number (ℝ)
High correlation 
| Distinct | 889 |
|---|---|
| Distinct (%) | 5.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 136.77793 |
| Minimum | 1 |
|---|---|
| Maximum | 3789 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 234.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 36 |
| median | 86 |
| Q3 | 150 |
| 95-th percentile | 510.05 |
| Maximum | 3789 |
| Range | 3788 |
| Interquartile range (IQR) | 114 |
Descriptive statistics
| Standard deviation | 197.99709 |
|---|---|
| Coefficient of variation (CV) | 1.4475807 |
| Kurtosis | 38.293389 |
| Mean | 136.77793 |
| Median Absolute Deviation (MAD) | 56 |
| Skewness | 4.6987253 |
| Sum | 2051669 |
| Variance | 39202.849 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 6 | 1147 | 7.6% |
| 12 | 438 | 2.9% |
| 4 | 398 | 2.7% |
| 11 | 279 | 1.9% |
| 20 | 244 | 1.6% |
| 19 | 153 | 1.0% |
| 8 | 142 | 0.9% |
| 1 | 117 | 0.8% |
| 60 | 115 | 0.8% |
| 67 | 111 | 0.7% |
| Other values (879) | 11856 |
| Value | Count | Frequency (%) |
| 1 | 117 | 0.8% |
| 2 | 19 | 0.1% |
| 4 | 398 | 2.7% |
| 6 | 1147 | |
| 8 | 142 | 0.9% |
| 10 | 6 | < 0.1% |
| 11 | 279 | 1.9% |
| 12 | 438 | 2.9% |
| 15 | 27 | 0.2% |
| 16 | 55 | 0.4% |
| Value | Count | Frequency (%) |
| 3789 | 1 | |
| 3582 | 1 | |
| 2861 | 1 | |
| 2857 | 1 | |
| 2652 | 1 | |
| 2410 | 1 | |
| 2373 | 1 | |
| 2334 | 1 | |
| 2209 | 1 | |
| 2026 | 1 |
TMhelixsource
Categorical
Constant 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.0 MiB |
| TMHMM2.0 |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | TMHMM2.0 |
|---|---|
| 2nd row | TMHMM2.0 |
| 3rd row | TMHMM2.0 |
| 4th row | TMHMM2.0 |
| 5th row | TMHMM2.0 |
Common Values
| Value | Count | Frequency (%) |
| TMHMM2.0 | 15000 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| tmhmm2.0 | 15000 |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 45000 | |
| T | 15000 | 12.5% |
| H | 15000 | 12.5% |
| 2 | 15000 | 12.5% |
| . | 15000 | 12.5% |
| 0 | 15000 | 12.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 120000 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| M | 45000 | |
| T | 15000 | 12.5% |
| H | 15000 | 12.5% |
| 2 | 15000 | 12.5% |
| . | 15000 | 12.5% |
| 0 | 15000 | 12.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 120000 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| M | 45000 | |
| T | 15000 | 12.5% |
| H | 15000 | 12.5% |
| 2 | 15000 | 12.5% |
| . | 15000 | 12.5% |
| 0 | 15000 | 12.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 120000 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| M | 45000 | |
| T | 15000 | 12.5% |
| H | 15000 | 12.5% |
| 2 | 15000 | 12.5% |
| . | 15000 | 12.5% |
| 0 | 15000 | 12.5% |
TMhelixstart
Real number (ℝ)
High correlation 
| Distinct | 825 |
|---|---|
| Distinct (%) | 5.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 94.883667 |
| Minimum | 2 |
|---|---|
| Maximum | 3763 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 234.4 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 10 |
| median | 37 |
| Q3 | 90 |
| 95-th percentile | 468 |
| Maximum | 3763 |
| Range | 3761 |
| Interquartile range (IQR) | 80 |
Descriptive statistics
| Standard deviation | 186.11565 |
|---|---|
| Coefficient of variation (CV) | 1.9615141 |
| Kurtosis | 43.503572 |
| Mean | 94.883667 |
| Median Absolute Deviation (MAD) | 30 |
| Skewness | 5.1199736 |
| Sum | 1423255 |
| Variance | 34639.034 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7 | 1153 | 7.7% |
| 5 | 1011 | 6.7% |
| 4 | 976 | 6.5% |
| 10 | 534 | 3.6% |
| 13 | 472 | 3.1% |
| 15 | 345 | 2.3% |
| 20 | 301 | 2.0% |
| 12 | 282 | 1.9% |
| 21 | 244 | 1.6% |
| 39 | 188 | 1.3% |
| Other values (815) | 9494 |
| Value | Count | Frequency (%) |
| 2 | 117 | 0.8% |
| 3 | 19 | 0.1% |
| 4 | 976 | |
| 5 | 1011 | |
| 6 | 82 | 0.5% |
| 7 | 1153 | |
| 9 | 142 | 0.9% |
| 10 | 534 | |
| 11 | 22 | 0.1% |
| 12 | 282 | 1.9% |
| Value | Count | Frequency (%) |
| 3763 | 1 | |
| 3200 | 1 | |
| 2829 | 1 | |
| 2739 | 1 | |
| 2609 | 1 | |
| 2335 | 1 | |
| 2178 | 1 | |
| 2101 | 1 | |
| 2027 | 1 | |
| 2017 | 1 |
TMhelixend
Real number (ℝ)
High correlation 
| Distinct | 842 |
|---|---|
| Distinct (%) | 5.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 115.74207 |
| Minimum | 18 |
|---|---|
| Maximum | 3782 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 234.4 KiB |
Quantile statistics
| Minimum | 18 |
|---|---|
| 5-th percentile | 24 |
| Q1 | 31 |
| median | 58 |
| Q3 | 111 |
| 95-th percentile | 488.05 |
| Maximum | 3782 |
| Range | 3764 |
| Interquartile range (IQR) | 80 |
Descriptive statistics
| Standard deviation | 186.34886 |
|---|---|
| Coefficient of variation (CV) | 1.6100357 |
| Kurtosis | 43.331369 |
| Mean | 115.74207 |
| Median Absolute Deviation (MAD) | 31 |
| Skewness | 5.1104464 |
| Sum | 1736131 |
| Variance | 34725.896 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 29 | 840 | 5.6% |
| 26 | 694 | 4.6% |
| 27 | 690 | 4.6% |
| 24 | 472 | 3.1% |
| 32 | 423 | 2.8% |
| 35 | 350 | 2.3% |
| 23 | 271 | 1.8% |
| 37 | 257 | 1.7% |
| 34 | 254 | 1.7% |
| 42 | 251 | 1.7% |
| Other values (832) | 10498 |
| Value | Count | Frequency (%) |
| 18 | 3 | < 0.1% |
| 19 | 23 | 0.2% |
| 20 | 10 | 0.1% |
| 21 | 209 | 1.4% |
| 22 | 183 | 1.2% |
| 23 | 271 | 1.8% |
| 24 | 472 | |
| 25 | 99 | 0.7% |
| 26 | 694 | |
| 27 | 690 |
| Value | Count | Frequency (%) |
| 3782 | 1 | |
| 3222 | 1 | |
| 2851 | 1 | |
| 2761 | 1 | |
| 2628 | 1 | |
| 2357 | 1 | |
| 2200 | 1 | |
| 2123 | 1 | |
| 2049 | 1 | |
| 2036 | 1 |
Outsidesource
Categorical
Constant 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.0 MiB |
| TMHMM2.0 |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | TMHMM2.0 |
|---|---|
| 2nd row | TMHMM2.0 |
| 3rd row | TMHMM2.0 |
| 4th row | TMHMM2.0 |
| 5th row | TMHMM2.0 |
Common Values
| Value | Count | Frequency (%) |
| TMHMM2.0 | 15000 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| tmhmm2.0 | 15000 |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 45000 | |
| T | 15000 | 12.5% |
| H | 15000 | 12.5% |
| 2 | 15000 | 12.5% |
| . | 15000 | 12.5% |
| 0 | 15000 | 12.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 120000 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| M | 45000 | |
| T | 15000 | 12.5% |
| H | 15000 | 12.5% |
| 2 | 15000 | 12.5% |
| . | 15000 | 12.5% |
| 0 | 15000 | 12.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 120000 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| M | 45000 | |
| T | 15000 | 12.5% |
| H | 15000 | 12.5% |
| 2 | 15000 | 12.5% |
| . | 15000 | 12.5% |
| 0 | 15000 | 12.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 120000 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| M | 45000 | |
| T | 15000 | 12.5% |
| H | 15000 | 12.5% |
| 2 | 15000 | 12.5% |
| . | 15000 | 12.5% |
| 0 | 15000 | 12.5% |
Outsidestart
Real number (ℝ)
High correlation 
| Distinct | 766 |
|---|---|
| Distinct (%) | 5.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 88.063533 |
| Minimum | 1 |
|---|---|
| Maximum | 2720 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 234.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 36 |
| Q3 | 85 |
| 95-th percentile | 410.05 |
| Maximum | 2720 |
| Range | 2719 |
| Interquartile range (IQR) | 84 |
Descriptive statistics
| Standard deviation | 168.82232 |
|---|---|
| Coefficient of variation (CV) | 1.9170514 |
| Kurtosis | 29.391507 |
| Mean | 88.063533 |
| Median Absolute Deviation (MAD) | 35 |
| Skewness | 4.5598674 |
| Sum | 1320953 |
| Variance | 28500.976 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 3844 | |
| 30 | 1038 | 6.9% |
| 36 | 531 | 3.5% |
| 25 | 530 | 3.5% |
| 27 | 398 | 2.7% |
| 28 | 389 | 2.6% |
| 35 | 322 | 2.1% |
| 44 | 319 | 2.1% |
| 32 | 261 | 1.7% |
| 43 | 200 | 1.3% |
| Other values (756) | 7168 |
| Value | Count | Frequency (%) |
| 1 | 3844 | |
| 17 | 3 | < 0.1% |
| 18 | 2 | < 0.1% |
| 19 | 5 | < 0.1% |
| 20 | 70 | 0.5% |
| 21 | 18 | 0.1% |
| 22 | 52 | 0.3% |
| 23 | 141 | 0.9% |
| 24 | 33 | 0.2% |
| 25 | 530 | 3.5% |
| Value | Count | Frequency (%) |
| 2720 | 1 | |
| 2358 | 1 | |
| 2087 | 1 | |
| 2050 | 1 | |
| 2036 | 1 | |
| 2032 | 1 | |
| 1953 | 1 | |
| 1883 | 1 | |
| 1667 | 1 | |
| 1591 | 1 |
Outsideend
Real number (ℝ)
High correlation 
| Distinct | 1126 |
|---|---|
| Distinct (%) | 7.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 174.0852 |
| Minimum | 3 |
|---|---|
| Maximum | 5055 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 234.4 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 33 |
| median | 80 |
| Q3 | 181.25 |
| 95-th percentile | 728 |
| Maximum | 5055 |
| Range | 5052 |
| Interquartile range (IQR) | 148.25 |
Descriptive statistics
| Standard deviation | 287.17438 |
|---|---|
| Coefficient of variation (CV) | 1.6496197 |
| Kurtosis | 29.473674 |
| Mean | 174.0852 |
| Median Absolute Deviation (MAD) | 61 |
| Skewness | 4.286865 |
| Sum | 2611278 |
| Variance | 82469.126 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 976 | 6.5% |
| 4 | 613 | 4.1% |
| 9 | 534 | 3.6% |
| 14 | 345 | 2.3% |
| 38 | 169 | 1.1% |
| 19 | 148 | 1.0% |
| 33 | 132 | 0.9% |
| 32 | 126 | 0.8% |
| 39 | 116 | 0.8% |
| 28 | 112 | 0.7% |
| Other values (1116) | 11729 |
| Value | Count | Frequency (%) |
| 3 | 976 | |
| 4 | 613 | |
| 5 | 82 | 0.5% |
| 6 | 6 | < 0.1% |
| 9 | 534 | |
| 10 | 16 | 0.1% |
| 11 | 3 | < 0.1% |
| 12 | 34 | 0.2% |
| 14 | 345 | 2.3% |
| 16 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 5055 | 1 | |
| 4711 | 1 | |
| 3762 | 1 | |
| 3366 | 1 | |
| 3283 | 1 | |
| 3199 | 1 | |
| 3027 | 1 | |
| 2828 | 1 | |
| 2738 | 1 | |
| 2705 | 1 |
Phage_source
Categorical
| Distinct | 13 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1008.1 KiB |
| MGV | |
|---|---|
| GPD | |
| TemPhD | |
| GOV2 | |
| CHVD | |
| Other values (8) |
Length
| Max length | 8 |
|---|---|
| Median length | 3 |
| Mean length | 3.8216667 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | MGV |
|---|---|
| 2nd row | MGV |
| 3rd row | TemPhD |
| 4th row | TemPhD |
| 5th row | GPD |
Common Values
| Value | Count | Frequency (%) |
| MGV | 4368 | |
| GPD | 3948 | |
| TemPhD | 2350 | |
| GOV2 | 2062 | |
| CHVD | 1056 | 7.0% |
| GVD | 443 | 3.0% |
| RefSeq | 229 | 1.5% |
| IGVD | 176 | 1.2% |
| PhagesDB | 170 | 1.1% |
| Genbank | 106 | 0.7% |
| Other values (3) | 92 | 0.6% |
Length
| Value | Count | Frequency (%) |
| mgv | 4368 | |
| gpd | 3948 | |
| temphd | 2350 | |
| gov2 | 2062 | |
| chvd | 1056 | 7.0% |
| gvd | 443 | 3.0% |
| refseq | 229 | 1.5% |
| igvd | 176 | 1.2% |
| phagesdb | 170 | 1.1% |
| genbank | 106 | 0.7% |
| Other values (3) | 92 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| G | 11103 | |
| V | 8177 | |
| D | 8165 | |
| P | 6468 | |
| M | 4377 | 7.6% |
| e | 3084 | 5.4% |
| h | 2520 | 4.4% |
| T | 2422 | 4.2% |
| m | 2350 | 4.1% |
| O | 2062 | 3.6% |
| Other values (18) | 6597 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 57325 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| G | 11103 | |
| V | 8177 | |
| D | 8165 | |
| P | 6468 | |
| M | 4377 | 7.6% |
| e | 3084 | 5.4% |
| h | 2520 | 4.4% |
| T | 2422 | 4.2% |
| m | 2350 | 4.1% |
| O | 2062 | 3.6% |
| Other values (18) | 6597 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 57325 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| G | 11103 | |
| V | 8177 | |
| D | 8165 | |
| P | 6468 | |
| M | 4377 | 7.6% |
| e | 3084 | 5.4% |
| h | 2520 | 4.4% |
| T | 2422 | 4.2% |
| m | 2350 | 4.1% |
| O | 2062 | 3.6% |
| Other values (18) | 6597 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 57325 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| G | 11103 | |
| V | 8177 | |
| D | 8165 | |
| P | 6468 | |
| M | 4377 | 7.6% |
| e | 3084 | 5.4% |
| h | 2520 | 4.4% |
| T | 2422 | 4.2% |
| m | 2350 | 4.1% |
| O | 2062 | 3.6% |
| Other values (18) | 6597 |
Interactions
Correlations
| Expnumberfirst60AAs | ExpnumberofAAsinTMHs | Insideend | Insidestart | Length | Outsideend | Outsidestart | Phage_source | PredictedTMHsNumber | TMhelixend | TMhelixstart | TotalprobofNin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Expnumberfirst60AAs | 1.000 | 0.407 | -0.165 | 0.136 | -0.358 | -0.334 | -0.130 | 0.048 | 0.356 | -0.145 | -0.164 | 0.157 |
| ExpnumberofAAsinTMHs | 0.407 | 1.000 | 0.513 | 0.732 | 0.256 | 0.310 | 0.581 | 0.054 | 0.874 | 0.691 | 0.671 | 0.090 |
| Insideend | -0.165 | 0.513 | 1.000 | 0.778 | 0.512 | 0.108 | 0.294 | 0.042 | 0.532 | 0.669 | 0.662 | -0.250 |
| Insidestart | 0.136 | 0.732 | 0.778 | 1.000 | 0.272 | 0.098 | 0.251 | 0.040 | 0.784 | 0.658 | 0.652 | -0.180 |
| Length | -0.358 | 0.256 | 0.512 | 0.272 | 1.000 | 0.737 | 0.441 | 0.036 | 0.252 | 0.476 | 0.480 | 0.001 |
| Outsideend | -0.334 | 0.310 | 0.108 | 0.098 | 0.737 | 1.000 | 0.731 | 0.037 | 0.296 | 0.596 | 0.608 | 0.246 |
| Outsidestart | -0.130 | 0.581 | 0.294 | 0.251 | 0.441 | 0.731 | 1.000 | 0.052 | 0.613 | 0.766 | 0.768 | 0.281 |
| Phage_source | 0.048 | 0.054 | 0.042 | 0.040 | 0.036 | 0.037 | 0.052 | 1.000 | 0.048 | 0.041 | 0.041 | 0.029 |
| PredictedTMHsNumber | 0.356 | 0.874 | 0.532 | 0.784 | 0.252 | 0.296 | 0.613 | 0.048 | 1.000 | 0.677 | 0.679 | 0.101 |
| TMhelixend | -0.145 | 0.691 | 0.669 | 0.658 | 0.476 | 0.596 | 0.766 | 0.041 | 0.677 | 1.000 | 0.994 | 0.093 |
| TMhelixstart | -0.164 | 0.671 | 0.662 | 0.652 | 0.480 | 0.608 | 0.768 | 0.041 | 0.679 | 0.994 | 1.000 | 0.102 |
| TotalprobofNin | 0.157 | 0.090 | -0.250 | -0.180 | 0.001 | 0.246 | 0.281 | 0.029 | 0.101 | 0.093 | 0.102 | 1.000 |
Missing values
Sample
| Phage_ID | Protein_ID | Length | PredictedTMHsNumber | ExpnumberofAAsinTMHs | Expnumberfirst60AAs | TotalprobofNin | POSSIBLENterm | Insidesource | Insidestart | Insideend | TMhelixsource | TMhelixstart | TMhelixend | Outsidesource | Outsidestart | Outsideend | Phage_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1392473 | MGV-GENOME-0377366 | MGV-GENOME-0377366_94 | 107 | 2 | 39.89314 | 39.89314 | 0.99747 | True | TMHMM2.0 | 53.0 | 107.0 | TMHMM2.0 | 33.0 | 52.0 | TMHMM2.0 | 30.0 | 32.0 | MGV |
| 1225655 | MGV-GENOME-0228589 | MGV-GENOME-0228589_3 | 204 | 2 | 45.03472 | 42.87273 | 0.99861 | True | TMHMM2.0 | 62.0 | 204.0 | TMHMM2.0 | 39.0 | 61.0 | TMHMM2.0 | 30.0 | 38.0 | MGV |
| 2065853 | TemPhD_cluster_54944 | TemPhD_cluster_54944_50 | 108 | 1 | 22.15821 | 21.19442 | 0.14212 | True | TMHMM2.0 | 43.0 | 108.0 | TMHMM2.0 | 24.0 | 42.0 | TMHMM2.0 | 1.0 | 23.0 | TemPhD |
| 1828787 | TemPhD_cluster_21940 | TemPhD_cluster_21940_29 | 41 | 1 | 22.60786 | 22.60786 | 0.35682 | True | TMHMM2.0 | 38.0 | 41.0 | TMHMM2.0 | 15.0 | 37.0 | TMHMM2.0 | 1.0 | 14.0 | TemPhD |
| 575773 | uvig_280215 | uvig_280215_16 | 571 | 1 | 22.92261 | 0.00000 | 0.91546 | NaN | TMHMM2.0 | 1.0 | 169.0 | TMHMM2.0 | 170.0 | 192.0 | TMHMM2.0 | 193.0 | 571.0 | GPD |
| 1869556 | TemPhD_cluster_2820 | TemPhD_cluster_2820_6 | 66 | 1 | 19.76984 | 19.76956 | 0.91180 | True | TMHMM2.0 | 1.0 | 6.0 | TMHMM2.0 | 7.0 | 26.0 | TMHMM2.0 | 27.0 | 66.0 | TemPhD |
| 717310 | uvig_396803 | uvig_396803_67 | 183 | 1 | 18.65302 | 18.45494 | 0.74857 | True | TMHMM2.0 | 1.0 | 6.0 | TMHMM2.0 | 7.0 | 25.0 | TMHMM2.0 | 26.0 | 183.0 | GPD |
| 1193825 | MGV-GENOME-0085121 | MGV-GENOME-0085121_16 | 98 | 2 | 42.31297 | 40.90727 | 0.98844 | True | TMHMM2.0 | 62.0 | 98.0 | TMHMM2.0 | 44.0 | 61.0 | TMHMM2.0 | 35.0 | 43.0 | MGV |
| 1524904 | MGV-GENOME-0378116 | MGV-GENOME-0378116_30 | 122 | 2 | 41.78412 | 23.94402 | 0.91384 | True | TMHMM2.0 | 81.0 | 122.0 | TMHMM2.0 | 58.0 | 80.0 | TMHMM2.0 | 44.0 | 57.0 | MGV |
| 1051224 | MGV-GENOME-0357329 | MGV-GENOME-0357329_12 | 169 | 3 | 64.41947 | 27.06577 | 0.30090 | True | TMHMM2.0 | 119.0 | 169.0 | TMHMM2.0 | 96.0 | 118.0 | TMHMM2.0 | 93.0 | 95.0 | MGV |
| Phage_ID | Protein_ID | Length | PredictedTMHsNumber | ExpnumberofAAsinTMHs | Expnumberfirst60AAs | TotalprobofNin | POSSIBLENterm | Insidesource | Insidestart | Insideend | TMhelixsource | TMhelixstart | TMhelixend | Outsidesource | Outsidestart | Outsideend | Phage_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1441688 | MGV-GENOME-0273252 | MGV-GENOME-0273252_29 | 79 | 2 | 40.91968 | 40.91747 | 0.93822 | True | TMHMM2.0 | 54.0 | 79.0 | TMHMM2.0 | 34.0 | 53.0 | TMHMM2.0 | 25.0 | 33.0 | MGV |
| 2103306 | TemPhD_cluster_57437 | TemPhD_cluster_57437_56 | 98 | 3 | 62.49815 | 36.02355 | 0.99237 | True | TMHMM2.0 | 67.0 | 72.0 | TMHMM2.0 | 73.0 | 95.0 | TMHMM2.0 | 96.0 | 98.0 | TemPhD |
| 1091401 | MGV-GENOME-0271360 | MGV-GENOME-0271360_27 | 538 | 3 | 70.80607 | 15.07114 | 0.22167 | True | TMHMM2.0 | 521.0 | 538.0 | TMHMM2.0 | 503.0 | 520.0 | TMHMM2.0 | 489.0 | 502.0 | MGV |
| 984303 | MGV-GENOME-0355129 | MGV-GENOME-0355129_34 | 151 | 2 | 48.30061 | 4.82098 | 0.87113 | NaN | TMHMM2.0 | 151.0 | 151.0 | TMHMM2.0 | 131.0 | 150.0 | TMHMM2.0 | 96.0 | 130.0 | MGV |
| 1079974 | MGV-GENOME-4416057 | MGV-GENOME-4416057_42 | 148 | 2 | 37.63759 | 19.39025 | 0.07251 | True | TMHMM2.0 | 25.0 | 119.0 | TMHMM2.0 | 120.0 | 142.0 | TMHMM2.0 | 143.0 | 148.0 | MGV |
| 198485 | uvig_2223 | uvig_2223_77 | 169 | 3 | 64.68193 | 26.04394 | 0.22898 | True | TMHMM2.0 | 119.0 | 169.0 | TMHMM2.0 | 96.0 | 118.0 | TMHMM2.0 | 93.0 | 95.0 | GPD |
| 967090 | MGV-GENOME-0379330 | MGV-GENOME-0379330_317 | 225 | 1 | 22.53788 | 22.46349 | 0.09151 | True | TMHMM2.0 | 51.0 | 225.0 | TMHMM2.0 | 28.0 | 50.0 | TMHMM2.0 | 1.0 | 27.0 | MGV |
| 2171878 | TemPhD_cluster_8005 | TemPhD_cluster_8005_19 | 97 | 2 | 44.01212 | 24.09761 | 0.99159 | True | TMHMM2.0 | 81.0 | 97.0 | TMHMM2.0 | 58.0 | 80.0 | TMHMM2.0 | 44.0 | 57.0 | TemPhD |
| 1789301 | TemPhD_cluster_16210 | TemPhD_cluster_16210_56 | 58 | 1 | 21.82124 | 21.82124 | 0.11086 | True | TMHMM2.0 | 27.0 | 58.0 | TMHMM2.0 | 4.0 | 26.0 | TMHMM2.0 | 1.0 | 3.0 | TemPhD |
| 2770945 | Station155_DCM_ALL_assembly_NODE_4760_length_17116_cov_5.143016 | Station155_DCM_ALL_assembly_NODE_4760_length_17116_cov_5.143016_14 | 77 | 1 | 21.46824 | 11.34032 | 0.93221 | True | TMHMM2.0 | 1.0 | 49.0 | TMHMM2.0 | 50.0 | 72.0 | TMHMM2.0 | 73.0 | 77.0 | GOV2 |